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ABSTRACT 


A  technique  which  yields  fast  parallel  algorithms  for  several  zero-one  supply- 
demand  problems  is  presented.  We  give  NC  algorithms  for  the  following 
related  problems: 

(1)  Given  a  sequence  of  supplies  au  .  .  .  ,a„  and  demands  bu  .  .  .  ,bm,  con¬ 
struct  a  zero-one  flow  pattern  satisfying  these  constraints,  where  every  supply 
vertex  can  send  at  most  one  unit  of  flow  to  each  demand  vertex. 


(2)  Given  a  sequence  of  positive  and  negative  integers  summing  to  zero, 
representing  supplies  and  demands  respectively,  construct  a  zero-one  flow  pat¬ 
tern  so  that  the  net  flow  out  of  (into)  each  vertex  is  its  supply  'demand),  where 
every  vertex  can  send  at  most  one  unit  of  flow  to  every  other  vertex. 

(3)  Construct  a  digraph  without  self-loops  with  specified  in-  and  out-degrees. 


We  extend  our  results  to  the  case  where  the  input  represents  upper 
bounds  on  supplies  and  lower  bounds  on  demands 


1.  Introduction 

Supply-demand  problems  are  fundamental  in  combinatorial  optimization 
([FF],[La]).  In  one  formulation  of  the  problem  the  input  is  a  network  in  which  each 
arc  has  a  non-negative  capacity,  and  each  vertex  has  a  certain  supply  or  demand 
(possibly  zero).  The  task  is  to  find  a  flow  function,  such  that  the  flow  through  each 
arc  is  no  more  than  its  capacity  and  the  difference  between  the  flow  into  a  vertex 
and  out  of  it  is  equal  to  its  supply  (or  demand).  This  problem  is  equivalent  to  the 
general  max  flow  problem,  and  can,  therefore,  be  solved  efficiently  sequentially 
([La],[PS],[GT]),  but  probably  has  no  efficient  parallel  solution,  since  it  is  P- 
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complete  ([GSS]).  There  are,  however,  many  interesting  special  cases  of  this  prob¬ 
lem  whose  solutions  do  not  require  the  full  power  of  general  max  flow. 

In  this  paper  we  are  concerned  with  several  such  problems.  The  first  problem 
we  discuss  is:  given  a  sequence  of  supplies,  au  .  .  .  ,an,  and  demands, 

blt  .  .  .  ,bm,  construct  a  zero-one  flow  pattern  satisfying  these  constraints,  where 

every  supply  vertex  can  send  at  most  one  unit  of  flow  to  each  demand  vertex. 
Equivalently,  we  can  state  this  problem  as  that  of  constructing  a  zero-one  matrix, 
M,  having  a,  l’s  in  the  i’th  row  and  bj  l’s  in  the  j’th  column  (for  all 

\<i<n  ,  1  We  will  refer  to  this  problem  ^  as  the 

matrix  construction  problem.  M  is  called  a  realization  for  the  input  (a,b).  There  is 
a  simple  sequential  algorithm  for  constructing  a  realization  if  one  exists  ([FF],[G]). 
select  any  row,  assign  its  l’s  to  the  columns  having  largest  column  sums  and 
repeat  this  procedure  in  the  reduced  problem.  If  this  procedure  gets  stuck  (i.e. 
some  column  sum  becomes  negative),  then  no  realization  exists. 


This  algorithm,  although  easy  to  implement  sequentially,  seems  very  hard  to 
parallelize.  Thus  it  is  natural  to  ask  if  there  is  a  fast  parallel  algorithm  for  this 
problem.  Two  remarks  are  relevant  to  this  question:  first,  the  problem  can  be 
solved  by  network  flow  techniques.  Since  the  capacities  are  small  (polynomial  in 
the  size  of  the  flow  network),  there  are  Random  NC  algorithms  for  the  problem  by 
reduction  to  maximum  matching  ([KUW2],[MVV]),  ^Second,  there  is  a  simple 
sequential  method  for  testing  whether  an  instance,  (a,i>)  is  realizable  ([FF],[B]).  It 
is  based  on  partial  sums  of  the  sequences,  and  can  be  implemented  in  NC  in  a 
straightforward  manner.  However,  this  method  does  not  yield  a  way  of  construct¬ 
ing  a  realization.  This  is  another  example  of  the  apparent  gap  between  search  and 
decision  problems  in  the  parallel  realm  ([KUW1]). 

We  present  a  deterministic  NC  algorithm  for  the  matrix  construction  problem. 
Our  algorithm  can  be  implemented  to  run  in  time  0(log4|Mj)  using  0(|M|-(n  +  m)) 
processors  on  a  CRCW  PRAM,  or  in  time  0(log3|M|)  using  0(|M|-(n +  m)3)  proces¬ 
sors  on  an  EREW  PRAM,  where  M  is  the  realization  matrix  with  n  rows  and  m 
columns  and  \M\  is  the  size  of  M  (i.e.  n-m).  When  n  =  9(m)  the  number  of  proces¬ 
sors  is  0(|M|15).  and  0(|M|2  5)  respectively. 

The  algorithm  is  based  on  a  careful  examination  of  the  network  flow  formula¬ 
tion  of  the  problem.  It  exploits  the  fact  that  there  are  only  a  polynomial  number  of 
cuts  which  need  to  be  considered,  and  that  this  set  of  potentially  min  cuts  has  a 
natural  ordering  associated  with  it. 

The  methodology  we  develop  enables  us  to  solve  the  following  two  related 
problems  (with  the  same  time  and  processor  bounds): 

(1)  The  symmetric  supply-demand  problem  -  given  a  sequence  of  positive  and 
negative  integers  summing  to  zero,  representing  supplies  and  demands  respec¬ 
tively,  construct  a  zero-one  flow  pattern  so  that  the  net  flow  out  of  (into)  each 
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vertex  is  its  supply  (demand),  where  every  vertex  can  send  at  most  one  unit  of  flow 
to  every  other  vertex.  Notice  that  this  problem  is  quite  different  than  the  matrix 
construction  problem,  since  it  does  not  have  a  bipartite  nature. 

(2)  The  dieravh  construction  problem  -  construct  a  simple  directed  graph  with 
specified  in-  and  out-degrees.  This  corresponds  to  constructing  a  zero-one  matrix 
with  specified  row  and  column  sums,  where  the  diagonal  entries  are  forced  to  be 
zero.  [FF]  and  [B]  give  a  simple  sequential  algorithm  when  the  in-  and  out- 
degrees  are  sorted  in  the  same  order  (i.e,  a  vertex  with  higher  in-degree  has  higher 
out-degree).  Our  algorithm  is  the  only  one  we  know  of  for  general  orders  that  does 

not  use  max  flow. 

We  extend  our  results  to  the  case  where  the  input  represents  upper  bounds  on 
supplies  and  lower  bounds  on  demands. 

An  outline  of  the  paper  follows: 

In  section  2  we  explain  in  detail  our  methodology.  We  then  state  the  matrix 
construction  algorithm  formally  and  finally  discuss  its  parallel  complexity  (time 
and  processor  bounds). 

In  section  3  we  describe  how  our  techniques  can  be  used  to  yield  a  solution  to 
the  symmetric  supply-demand  problem. 

Section  4  contains  a  description  of  the  algorithm  for  the  digraph  construction 
problem. 

Finally,  in  section  5  we  describe  our  method  for  solving  the  supply-demand 
problems  when  we  are  given  upper  bounds  on  supplies  and  lower  bounds  on 

demands. 

A  few  words  about  parallel  algorithms.  Our  algorithms  use,  in  various  places, 
partial  sum  computations.  The  basic  problem  in  this  category  is  -  given  a  sequence 

Xl,  .  .  .  ,Xh,  compute  all  sums  of  the  form  This  problem  is  widely  mentioned 

l  =  1 

in  the  literature  (e.g.  [F],[MR])  and  we  will  not  discuss  it  in  this  paper,  other  than 
mentioning,  that  it  can  be  solved  efficiently  in  parallel.  Several  other  tools  that  we 
use  implicitly  are  finding  connected  components  ([SV])  and  various  algorithms  on 
trees  ([TV]). 


2.  The  Matrix  Construction  Problem 
2.1.  The  Slack  Matrix 

Our  parallel  algorithm  is  based  on  a  careful  analysis  of  the  network  flow  for¬ 
mulation  of  the  problem.  The  main  tool  we  use  is,  what  we  call,  the  slack  matrix 
which  is  similar  to  the  "structure  matrix"  of  Ryser  [R],  In  order  to  define  the  slack 
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matrix,  we  need  to  look  at  the  solution  to  our  problem  by  network  flow.  Given  the 
input  (<?,&)  :  ai>  a2>  •  •  •  >  an  ,  b2>  •  ■  ■  s  bm,  we  construct  a  flow  network, 

N,  as  shown  in  fig.  2.1:  the  vertex  set  consists  of  a  source,  s,  a  sink,  t,  vertices 
Ui  ,l<i<n  corresponding  to  rows  and  vertices  Uj  ,  1  <./  <  m  corresponding  to 
columns.  The  arc  set  contains  three  types  of  arcs:  for  all  1  Si  =£ n,  1  <  m  there 

are  arcs  (s,a,)  of  capacity  an  of  capacity  bj  and  ( u,,Vj )  of  capacity  1. 


Fig.  2.1  :  Flow  network  for  solving  the  0-1  matrix  construction  problem 
Let  S  =  la,  =  Clearly  the  max  flow  value  in  N  is  bounded  by  S .  Furth- 

i  =  l  j  —  l 

ermore,  a  flow  which  satisfies  all  rows  and  columns  sums  is  of  value  S.  It  follows 
(by  the  max  flow  -  min  cut  theorem)  that  the  problem  instance  (a,b)  is  realizable  if 
and  only  if  every  directed  cut  in  N  has  capacity  at  least  S. 

Let  C  =  (CS:C*)  be  a  directed  cut  in  N  (i.e.  the  vertices  are  partitioned  into 
two  sets,  C3,C‘  s.t.  s€C*,t€Ci).  Say  Cs  contains  x  vertices  from  the  set 
{u1(  .  .  .  ,u„}  and  m  -y  vertices  from  {uj,  .  .  .  ,um}.  Observe  that  if  we  replace  Uj  by 
u,  in  C\  for  some  i<j,  then  the  capacity  of  the  cut  can  only  decrease.  Similarly, 
replacing  uk  by  vt  in  C3  can  only  decrease  the  capacity  of  the  cut,  for  l>k.  It  fol¬ 
lows  that  the  capacity  of  C  is  no  less  than  the  capacity  of  the  cut  Cz<y,  where 
C3  y  =  {s}[J{u1,...,uJ\J{vy  +  1,...,vm}.  Thus  there  are  only  nm  cuts, 
j  i<x<n  ,  l<y  Sm},  which  are  potential  min  cuts.  The  cut  Cx  y  is  shown  in 
fig.  2.2.  Therefore,  necessary  and  sufficient  conditions  for  the  instance  (a, b)  to  be 
realizable  are  that  for  every  l<x  <n  1  <y<m: 

capacity(Cz y)  =  ^  a,  +  ^  bj  +  x-y  ^  S 

i=i+l  j =y + 1 

£  a,  +  (S-%b.)  +  x-y  s  S 

i=x+i  j- i 
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^  a:  —  ^  bj  +  x  -y  2  0 

i=x  +  l  j= l 

Definition:  The  slack  of  Cx  y  of  problem  instance  (a, b)  is: 

sl^gixj)  =  ^  +  xy 

i  =z  + 1  7=1 

The  slack  matrix ,  SL?i-,  is  the  matrix  whose  ij' th  entry  is  sl^ij). 

Proposition  2.1:  The  instance  (o', 6)  is  realizable  if  and  only  if  SL-g  is  non¬ 
negative. 

Proposition  2.2:  Let  (a,b)  be  an  instance  which  is  realizable  by  some  matrix,  M, 
and  assume  that  sl~g(xj)  =  0.  Then: 

(1)  Af[t  j]  =  1  for  all  l<i^x,l^j^y 

(2)  M[ij]  =  0  for  all  x  +  l<i<n  ,y  +  l<j<m 

Proof:  Since  sl~/xj)  =  0,  the  cut  Cx  y  has  capacity  S,  which  means  that  in  any 
max  flow  forward  arcs  (1)  are  all  saturated,  and  backward  arcs  (2)  all  have  zero 
flow.  This  situation  is  shown  in  fig.  2.2.  0 


Fig.  2.2  :  A  tight  cut  -  sl^x,y)  =  0 
All  forward  arcs  are  saturated;  All  backward  arcs  have  flow  0 


If  sl~^x,y)  =  0,  We  will  call  Czy  a  tight  cut.  Proposition  2.2  shows  that  existence 
of  a  tight  cut  simplifies  the  solution  considerably.  In  fact  it  gives  rise  to  a  divide 
and  conquer  approach:  if  CI  V  is  tight,  constructing  a  matrix  M[l:n,l:m]  for  the  ori¬ 
ginal  problem  is  reduced  to  constructing  the  two  sub-matrices,  M[x  +  l:n,l:y]  and 
M[l:x,y  +  l:m].  Of  course,  we  are  not  always  lucky  enough  to  have  a  tight  cut. 
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Our  approach  is  to  perturb  the  input  so  as  to  improve  our  luck!  Here  is  a  high- 
level  description  of  our  algorithm: 

(1)  Perturb  the  inputs,  (a,  V).  Call  this  new  instance  (3,  /?). 

(2)  Recursively  solve  the  instance  (3,  /?).  Call  the  solution  M  . 

(3)  Correct  the  matrix  M'  to  obtain  a  matrix,  M,  which  solves  the  original 
instance,  (a,  b). 

How  do  we  perturb  an  instance?  A  basic  perturbation  can  be  viewed  as  shifting  one 
unit  from  the  poor  to  the  rich  in  order  to  make  the  situation  tighter:  subtract  1 
from  ak  and  add  1  to  a,  for  some  k>l.  We  do  not  allow  that  a  perturbation  will 
change  the  ordering  of  the  a’ s,  so  it  is  necessary  that  ak>ak  +  1  and  at<at.  i  before 
the  perturbation. 

Remark:  We  will  be  discussing  only  perturbations  of  the  row  sums  (the  a,  s).  All 
this  discussion  holds  for  perturbation  of  the  column  sums  as  well. 

Proposition  2.3:  Let  (a,b)  be  a  problem  instance,  and  let  (3,£)  be  obtained  by 
shifting  one  unit  from  a*  to  a,  for  some  k>l.  Then  sl^(x,y)  =  sl^x,y)~  1  if 
/< x<k,  and  s/a/x,y)  =  sls^x,y)  otherwise. 

Proof:  This  can  be  seen  by  looking  at  the  formula  for  si.  [] 

This  proposition  shows  that  a  basic  perturbation  reduces  the  slack  of  a  certain 
set  of  cuts,  and  leaves  the  rest  unchanged.  This  observation  is  the  basis  for  our 

algorithm. 


2.2.  One  Phase  of  Perturbations 

Achieving  poly-log  recursion  depth  for  the  basic  algorithm  described  in  the 
previous  section  is  a  non-trivial  matter.  The  reason  is  that  it  is  hard  to  control 
which  cut  or  cuts  will  become  tight.  Furthermore,  since  we  have  limited  ourselves 
to  perturbations  that  do  not  change  the  ordering  of  the  a-  s,  it  is  not  clear  that  a 
tight  cut  can  always  be  obtained. 

Say  we  are  shifting  units  from  ak  to  a,  (for  some  k>l).  How  many  units  can 
we  shift?  Viewing  the  unit  shifting  as  a  sequential  process  (i.e.  shifting  one  unit  at 
each  time  step),  we  can  shift  until  one  of  three  things  happens. 

(1)  ai  becomes  equal  to  a;_j. 

(2)  ak  becomes  equal  to  ak  +  i. 

(3)  sl~£(x,y)  becomes  zero,  for  some  l<x<k. 

In  case  (3)  progress  is  made,  since  a  tight  cut  is  created,  and  we  can  split  the 
problem  into  two  smaller  problems.  What  about  the  first  two  cases?  We  observe 
that  we  have  possibly  reduced  the  number  of  different  a t  values.  This  observation 
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is  the  key  to  our  approach  for  performing  perturbations. 

Definition:  The  complexity  of  an  instance  ( a,b )  comp{a,b)  is  the  product  of  the 
number  of  different  at  values  and  the  number  of  different  bj  values. 

Our  parallel  algorithm  works  in  phases.  The  input  to  a  perturbation  phase  is 
an  instance  of  certain  complexity,  say  K,  and  the  output  is  one  or  more  instances, 
each  having  complexity  bounded  by  c-K,  for  some  constant  c<l.  Finally,  if  the 
complexity  of  the  input  is  less  than  a  certain  constant,  B,  we  construct  a  realiza¬ 
tion  for  it  (this  is  the  base  case).  We  proceed  to  describe  one  perturbation  phase.  In 
this  discussion  we  will  derive  the  constants  c  and  B.  For  better  exposition  we  will 
first  describe  a  phase  as  a  sequential  process.  The  parallel  implementation  will  be 
explained  later. 

In  each  phase  either  row  sums  or  column  sums  are  perturbed.  The  sequence 
that  is  perturbed  (row  or  column  sums)  is  that  which  has  a  larger  number  of 
different  values.  We  will  discuss  a  phase  in  which  row  sums  are  perturbed. 
Phases  in  which  column  sums  are  perturbed  are  essentially  identical. 

A  phase  starts  by  selecting  a  consecutive  set  of  active  rows,  {h,  h  +  1,...,/}  .  The 
parameters  h  and  l  depend  on  the  input,  ( a,b )  and  its  complexity,  K,  and  will  be 
derived  later.  Let  L=al  +  l  and  H=ah_v  The  perturbation  is  performed  as  fol¬ 
lows:  repeatedly  shift  units  from  the  lowest  active  row,  (initially  row  /),  to  the 
highest  active  row,  (initially  row  h).  A  row  becomes  inactive,  and  stops  sending  or 
receiving  units,  when  its  row  sum  either  drops  to  L  or  reaches  H.  The  phase  ter¬ 
minates  when  one  of  two  things  happens: 

(1)  At  most  one  active  row  is  left. 

(2)  sl^^ixj)  becomes  zero,  for  some  h  <x<l. 

In  case  (1)  no  tight  cuts  have  been  obtained,  but  the  row  sums  of  all  the  active 
rows  (except,  possibly,  one)  have  become  either  L  or  H.  Therefore  the  number  of 
different  row  values  decreases. 

In  case  (2)  one  or  more  tight  cuts  are  created,  and  the  instance  can  be  split,  using 
proposition  2.2,  into  two  smaller  instances  ("smaller",  in  this  case,  means  less  rows 
and  lower  complexity). 

Let  a,/J  and  y  be  the  number  of  different  values  in  the  sets  {a1(  .  .  .  _  t} , 

{ah,  .  .  .  ,a;}  and  {a;  +  1,  .  .  .  ,a„ }  respectively.  We  want  to  select  these  parameters  so 
as  to  minimize  the  complexity  of  the  outputs  of  the  phase: 

Case  (1)  :  The  number  of  different  row  sums  remaining  is  bounded  by  a  +  y+1 
(since  the  )S  values  corresponding  to  active  rows  disappeared,  except  for  at  most 
one). 

Case  (2)  :  Zero  slack  is  obtained  for  one  or  more  rows  in  the  range  [h,l-  1].  A  sim¬ 
ple  calculation  shows  that  the  number  of  different  row  sums  in  the  resulting 
instances  is  bounded  either  by  a  +  /?  +  l  or  by  fi  +  y+l. 
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Thus  we  need  to  minimize  the  maximum  of  a  +  l 3  +  1  ,  a  +  y  +  1  and  fi  +  y  +  1 
subject  to  a  +  P  +  y=K  (where  K  =  comp{a,b)).  The  solution  is,  of  course,  to  have 
a,/?  and  7  as  equal  as  possible,  i.e.  all  roughly  K/ 3.  From  this  calculation  one  can 
see  that  the  complexity  can  be  reduced  by  these  perturbations  as  long  as  the 
number  of  different  row  values  is  more  than  5. 

To  summarize,  if  the  input  to  a  phase  has  complexity  K,  the  outputs  have 

complexity  bounded  by  [— 1  +  1.  Thus  the  total  number  of  phases  is  0(log(n  m)). 

o 

The  base  case  is  any  instance  with  at  most  5  different  row  values  and  5  different 
column  values. 

We  next  discuss  the  parallel  implementation  of  one  perturbation  phase.  The 
first  step  is  to  calculate  the  new  row  sums  and  slack  matrix  under  the  assumption 
that  none  of  the  cuts  become  tight.  If  this  new  slack  matrix  is  strictly  positive 
then,  indeed,  we  are  in  case  (1). 

Let  p  be  the  initial  number  of  active  rows  (p  =  l-h  +  1).  After  the  phase 
(assuming  case  (1)),  there  will  be  q  rows  of  value  H,  p-q  +  1  rows  of  value  L  and 
one  row  of  value  /,  where  H  > I  ^-L .  q  and  I  are  easy  to  calculate. 

i>f-D 

I  * I 


/  =  ^(a  —  L)  mod  (H  —  L) 
i  =  h 

Let  mt  =  min  {sl~/i,y)  \  l^y^m},  and  let  m  '  be  the  new  minimum  slack  in 
row  i  after  the  phase  is  completed  (assuming  case  (U).  Then: 

For  h^i<h  +  q  m,'  =  m{  -  ^£( if -a, ) 

j=h 

For  h  +  q^i<l  mt'  =  m,  —  ^  (a.j  —  L) 

j  =  1  +  1 

If  all  the  m,'  are  positive,  then  we  are  provably  in  case  (1).  If  not,  we  need  to 
detect  at  what  "time  step"  (during  the  "sequential  process")  the  first  tight  cut  was 
created.  This  turns  out  to  be  a  simple  task  for  the  following  reason:  if  we  plot  the 
value  of  any  entry  in  the  slack  matrix  as  a  function  of  time,  it  decreases  by  one 
unit  each  step  until  some  point  in  time,  and  remains  constant  from  that  point  on. 
Thus  the  rows  where  the  first  zero  slack  occurs  are  the  rows  for  which  m  is 
minimum  among  the  rows  that  have  m'  ^ 0.  The  total  number  of  units  shifted  in 
the  phase  is  this  minimum  m  value.  It  is  easy  to  compute  the  new  row  sums  given 
the  number  of  units  shifted. 
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In  both  cases  ((1)  and  (2))  we  need  to  calculate  the  number  of  units  shifted 
from  row  j  to  row  i,  for  every  h<i<j<l.  (These  numbers  will  be  used  later,  in 
the  correction  phase.)  This  calculation  can  be  performed  by  a  simple  partial-sums 
computation. 


2.3.  Correcting  a  Perturbed  Solution 

After  a  realization  is  obtained  for  the  perturbed  instance  we  need  to  correct  it 
in  order  to  obtain  a  realization  for  the  original  instance.  Clearly  the  required  task 
is  to  shift  units  back  to  their  original  rows.  The  rows  which  participate  in  the 
shifting  of  units  are  divided  into  two  sets  -  the  donors  and  the  receivers,  where 
donors  shift  units  to  the  receivers  during  the  perturbation  phase,  and  get  them 
back  at  the  correction  phase.  Note  that  no  row  is  both  a  donor  and  a  receiver  in 
any  given  phase.  Let  s(/',t)  be  the  the  number  of  units  shifted  from  the  donor  j  to 
the  receiver  i  in  the  perturbation  phase. 

Definition:  Let  M  be  a  realization  matrix.  Sliding  a  unit  from  row  t  to  row  j 
means  changing  M[ i,k]  from  1  to  0  and  M\j,k]  from  0  to  1  ,  for  some  column,  k. 

Lemma  2.1:  Given  any  realization  of  the  perturbed  instance,  M',  it  is  always  pos¬ 
sible  to  correct  it  by  sliding  s(j,i )  units  from  receiver,  i,  to  donor,  j,  for  all 
receivers  and  donors. 

Proof:  Again  it  is  convenient  to  view  the  process  of  sliding  units  as  a  sequential 
one.  Assume  that  some  of  the  units  have  been  slid,  but  less  than  s(j,i)  units  have 
been  slid  from  row  i  to  row  j.  Call  the  current  matrix  Ml.  We  will  show  that  it  is 
possible  to  slide  a  unit  from  row  i  to  row  j  in  M i,  which  proves  the  lemma. 

Since  units  were  shifted  from  row  j  to  row  i  in  the  perturbation  phase,  it  is 
the  case  that  a,  was  no  larger  than  a,  before  the  phase  began.  Other  perturbations 
in  which  rows  t  and  j  might  have  participated  only  increased  the  row  sum  of  i  and 
decreased  the  row  sum  of  j.  Now,  since  less  than  s(j,i)  units  have  been  slid  from 
row  t  back  to  row  j ,  it  follows  that  row  i  has  more  l’s  than  row  j  in  Mv  By  the 
pigeonhole  principle  there  is  some  column,  k,  such  that  Mi[i,k]—  1  and  M ,k]  —  0. 

[] 

The  implication  of  the  proof  above  is  that  we  do  not  need  to  be  very  careful  in 
the  way  we  slide  units.  The  main  problem  we  need  to  solve  is  that  conflicts  may 
arise  when  we  slide  many  units  in  parallel.  This  could  happen  since  a  donor  might 
have  shifted  units  to  many  receivers,  and  a  receiver  might  have  received  from 
many  donors.  Our  goal  is  to  break  down  the  problem  into  a  set  of  independent 
problems,  which  can  all  be  solved  in  parallel.  The  first  step  is  to  get  a  formal 
description  of  the  donor-receiver  relation. 

Definition:  The  donation  graph  G  =  (D,R£)  is  a  bipartite  graph  with  a  vertex, 
dj£.D,  representing  each  donor  and  a  vertex  r,€i2  representing  each  receiver,  such 
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that  the  edge  is  in  E  if  and  only  if  s{j,i)>0. 

The  following  lemma  plays  a  key  role  in  simplifying  the  situation: 

Lemma  2.2:  The  donation  graph,  G,  is  a  forest. 

Proof:  Call  a  neighbor  of  a  vertex,  v,  nontrivial  if  it  has  at  least  one  other  neigh¬ 
bor  besides  v.  It  follows  from  the  way  the  perturbations  were  performed  that  each 
vertex,  v,  has  at  most  two  nontrivial  neighbors,  one  that  became  inactive  before  v, 
and  one  that  became  inactive  after  u.  Furthermore,  all  the  vertices  can  be  ordered 
according  to  when  they  became  inactive.  Therefore  G  cannot  contain  any  cycles. 

[] 

One  can  see  that  a  matching  in  the  donation  graph,  G,  corresponds  to  an 
independent  set  of  sliding  problems.  However,  there  is  no  guarantee  that  the  edges 
of  G  can  be  partitioned  into  a  small  set  of  matchings,  since  G  might  have  vertices 
of  high  degree.  Thus  a  more  subtle  partition  is  required. 

Definition:  A  constellation  is  a  subgraph  of  a  given  graph  all  of  whose  connected 
components  are  stars  (where  a  star  is  a  tree  with  at  most  one  non-leaf  vertex). 

Lemma  2.3:  The  edges  of  a  forest  can  be  partitioned  into  two  (edge-disjoint)  con¬ 
stellations. 

Proof:  It  suffices  to  show  that  the  edges  of  a  tree  can  be  partitioned  into  two  con¬ 
stellations.  Let  T=(VJS )  be  a  tree,  and  take  it  to  be  rooted  at  some  vertex,  P. 
The  level  of  a  vertex  is  its  distance  from  P.  v  is  the  parent  of  u  if  {u,v}ZE  and  v  is 
closer  to  P  than  v.  The  partition  of  T  into  two  constellations, 
Cl  =  (V,E1)  ,  C2  =  (V^2)  ,  is  as  follows: 

Ex  =  {  {u,v}  ]  u  is  the  parent  of  v,  the  level  of  u  is  even  } 

E2  =  {  {a, v}  |  u  is  the  parent  of  v,  the  level  of  u  is  odd  } 

An  example  of  such  a  partition  is  shown  in  fig.  2.3.  [] 


- 11  - 


Fig  2.3  :  Partitioning  a  tree  into  two  constellations 


Our  solution  is  based  on  the  observation  that  a  constellation  corresponds  to  a  set  of 
independent  sliding  problems  which  we  can  solve  in  parallel.  Therefore  our 
approach  will  be  to  partition  the  donation  graph  into  two  constellations  and  then 
to  slide  units  in  two  stages  -  first  corresponding  to  one  constellation  and  then  to 
the  other. 

A  star  in  the  donation  graph  corresponds  to  several  donors  with  a  common 
receiver  or  several  receivers  with  a  common  donor.  These  two  cases  are  symmetric, 
so  we  will  discuss  only  the  first  one.  In  what  follows  we  describe  a  parallel  algo¬ 
rithm  that  slides  all  the  units  corresponding  to  a  star  with  receiver  R  and  donors 
Du  .  .  .  ,Dd.  Let  M  be  a  realization  matrix  of  the  perturbed  instance  we  are  about 
to  correct.  Let  r  ,  dy,  .  .  .  ,dd  denote  the  number  of  1  s  in  rows  R  ,  D i>  ■  •  •  >^d 
respectively  and  let  s;  =  s{DitR).  We  need  to  slide  s,  units  from  R  to  Dit  for  all 
1  <  i  <  (f  in  parallel.  Our  approach  is  to  solve  a  matching  problem  in  the  following 
bipartite  graph,  B  =  (X,Y,E): 

X  =  { Xj  |  M[RJ]  =  1/ 

Y  =  {yiik  |  lSi<d  , 

E  =  {  {xjJi.k}  I  M[D,j]  =  0/ 

Lemma  2.4:  Every  matching  of  B  which  covers  all  the  vertices  in  Y  corresponds  to 
sliding  si  units  from  R  to  Dt,  for  all  l^t  <d  simultaneously. 

Proof:  By  construction,  there  are  ±  s,  vertices  in  Y,  one  corresponding  to  each 

l  —  1 

unit  that  was  shifted  from  some  Dt  to  R  There  is  an  edge  between  xj  and  ^  if 
and  only  if  a  unit  can  be  slid  from  row  R  to  row  £),  in  column  k.  The  claim  is, 
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therefore,  evident.  □ 

At  first  sight  it  seems  that  we  need  to  solve  a  maximum  bipartite  matching 
problem,  but  closer  observation  reveals  the  following: 

Lemma  2.5:  Every  maximal  matching  in  B  is  maximum. 

Proof:  It  suffices  to  show  that  any  matching  which  does  not  cover  all  the  vertices 
in  Y  can  be  extended.  The  degree  of  y;  *  in  B  is,  by  definition,  at  least  r  —  d,. 
Before  the  perturbation  phase  the  row  sum  of  R  was  no  less  than  that  of  row  Dr 

After  the  perturbations,  the  row  sum  of  R  increased  by  at  least  ^s(,  and  the  row 

1  =  1 

sum  of  Di  decreased  by  at  least  1.  Therefore: 

For  all  i,  k  degree(ylk )  >  r-dt  >  ^s,  +  1  =  |Y|  +  1 

1  =  1 

Since  any  matching  contains  no  more  than  |  Y|  edges  it  follows  that  no  partial 
matching  is  maximal.  [] 

A  maximal  matching  can  be  constructed  efficiently  in  parallel  ([IS],[Lu]).  Our 
parallel  algorithm  is,  therefore,  the  following:  construct  the  donation  graph,  and 
partition  it  into  two  edge-disjoint  constellations,  Ci  and  C <i-  For  63-ch  component  of 
Cl  construct  the  bipartite  graph,  B,  as  described,  and  find  a  maximal  matching,  F, 
in  it.  For  all  edges  of  B  do  in  parallel:  if  {xj,yi  k}iF  then  slide  a  unit  from  R  to  Dt 
in  column  j .  Finally,  repeat  this  procedure  on  C2  (with  the  updated  matrix). 

It  follows  from  lemmas  2.4  and  2.5  that  after  performing  these  operations  all 
the  perturbations  (of  the  current  phase)  are  corrected. 


2.4.  The  Base  Case 

The  base  case  for  our  algorithm  is  when  the  number  of  different  values  of  row 
and  column  sums  is  bounded  by  a  constant  (5).  The  problem  is  then  characterized 
by  the  different  values:  ait  •  •  •  ,a 5  and  61,  •  •  •  ,65  and  their  multiplicities 
m,  •  •  •  ,ns  and  mlr  ■  ■  ■  ,m5  respectively.  Let  M  be  the  realization  matrix  we  con¬ 
struct,  and  let  M. ,  be  the  submatrix  of  M  induced  on  the  rows  with  sum  a,  and 
columns  with  sum  bj.  We  construct  M  in  two  steps: 

Step  1:  For  each  tj',1  <  <5,  determine  the  number,  FiJt  of  units  in  MtJ. 

Step  2:  For  each  distribute  the  Fid  units  between  the  different  rows 

and  columns  of  MtJ, 

We  carry  out  step  1  by  constructing  a  flow  network  of  constant  size,  and 
finding  a  max  flow  in  it.  The  network  has  twelve  vertices:  a  source  s,  a  sink  t,  five 
"row"  vertices  u^,  •  •  •  ,1*5,  and  five  "column"  vertices  Wj,  •  •  •  ,115.  The  arcs  are  of 
three  kinds:  arcs  from  s  to  each  u,  with  capacities  n^a,,  from  each  Vj  to  t  with 
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capacities  mybj,  and  from  each  ut  to  each  vj  with  capacities  n^mj.  This  network  is 
simply  the  result  of  taking  the  original  network  flow  formulation  for  this  problem, 
and  compressing  all  "row"  vertices  with  equal  capacity  into  one  vertex,  and  simi¬ 
larly  for  "column"  vertices.  Since  this  network  is  of  constant  size,  a  max  flow  can 
be  constructed  in  constant  time  using  standard  sequential  methods. 

In  step  2  we  convert  the  solution  for  the  compressed  network  to  a  solution  for 
the  original  network  by  distributing  the  flow  along  each  compressed  arc  evenly 
between  the  arcs  it  defines.  We  do  this  by  providing  a  solution  for  the  following 
problem:  construct  so  that  xtJ  selected  rows  have  each  r(J  units,  ytj  columns 
have  each  c,  ,  units  and  each  of  the  remaining  rows  and  columns  have  rij-l  and 
c  —  1  units  respectively.  First,  it  is  not  hard  to  see  that: 

'•1J=h^Ll  xij=Fij  mod  rii 

yij=Fu  mod  mj 
mj 

Assume  we  want  each  of  the  first  xtJ  rows  and  first  y^j  columns  to  have  r,j  and  ctj 
units  respectively.  Our  solution  is  to  put  the  units  of  the  first  row  in  the  first  r,j 
columns,  the  units  of  the  second  row  in  the  cyclically  next  set  of  columns  etc.  An 
example  is  shown  in  fig.  2.4. 


I  l  l  l  l  l 
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Fig.  2.4:  Structure  of  MtJ  with  5  rows.  7  columns  and  13  units. 

Selected  rows  and  columns  are  marked  with  arrows. 

A  construction  for  arbitrary  sets  of  selected  rows  and  columns  (not  necessarily  the 
first  ones)  is  obtained  from  the  one  described  above  by  simply  permuting  the  rows 
and  columns  appropriately. 

Now  we  are  ready  to  construct  a  realization,  M,  for  the  base  case.  The  values 
F.j  determine  the  xtj  and  ytJ  values.  All  we  need  to  ensure  is  that  any  two  rows 
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(columns)  with  equal  row  (column)  sums  get  selected  the  same  number  of  times. 
This  can  be  done  by  selecting  the  first  xu  rows  in  Mi  X,  the  cyclically  next  set  of 
x,  2  rows  in  2  and  so  on,  and  similarly  for  columns. 


Since 


ix  = 


nl-al 


the  total  number  of  rows  selected  in 


j  =  i 


{Mu,  •  •  ,^5}  *s  an  inteSer  multiple  of  nh  and  it  follows  that  any  two  rows 
with  equal  row  sums  are  selected  the  same  number  of  times.  A  similar  argument 
holds  for  columns.  Thus  the  construction  described  yields  a  correct  solution  for  the 
base  case. 


2.5.  The  Algorithm 

In  this  section  we  state  the  algorithm  more  formally.  A  few  words  about  nota¬ 
tion:  I.P  is  shorthand  for  "in  parallel",  comments  are  between  double  parentheses; 
l:k  denotes  a  range  of  indices  (in  a  matrix  or  a  sequence);  ||  denotes  concatenation 
of  sequences;  #A  is  the  cardinality  of  the  set  A . 

procedure  MATRIX-CONSTRUCTION  (^,b) 

((  This  is  the  recursive  procedure  for  constructing  a  matrix,  M,  with  given  row 
sums,  a,  and  column  sums,  b.  The  row  and  column  sums  are  assumed  to  be  given 
in  a  non-decreasing  order.  )) 

(1)  Let  n  =  length  of  a;  m  =  length  of  b . 

(2)  Compute  V-and  Vf-  the  number  of  different  values  in  a  and  b  resp. 

(3)  If  and  Vf— 5  then  return  BASE -CASE  (a,b). 

(4)  (of  $,S,SL,pert,zerop)  =  PERTURBATION  (a,  b). 

(5)  If  not  zerop  then  M'  —  MATRIX— CONSTRUCTION  ia.$) . 

(6)  Else  let  x,y  be  such  that  SL[x,y]  =  0  and  either  ax  is  in  the  middle  third  of 
the  a  values  or  by  is  in  the  middle  third  of  the  b  values.  Do  the  following  I.P. 

(6.1)  I.P  set  M'[iJ]  =  1  for  all  1  <i<x  , 

(6.2)  I.P  set  M'[iJ]  =  0  for  all  x<i<n  ,  y<;<m. 

(6.3) 

M'[x  +  l:n  ,  l:y]  =  MATRIX-CONSTRUCTIONi  H[x  +  1  :n]  ,  ?[l:y]~*  ) 

(6.4) 

M'[l:x  ,  y  +  l:m]  =  MATRIX-CONSTRUCTIONi  o[l :x]-y  ,  ]![y  +  l:m]  ) 
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(7)  M  =  CORRECTION (M'  ,S, pert). 

(8)  Return  M . 

end  MATRIX-CONSTRUCTION 
procedure  PERTURBATION  (a,  6*) 

((  This  procedure  computes  one  perturbation  phase.  The  inputs  are  row  sums,  a, 
and  column  sums,  5*.  The  outputs  are  new  row  and  column  sums,  3  and  resp,  the 
slack  matrix  SL,  the  matrix  of  numbers  of  units  shifted  S,  a  variable  pert  indicat¬ 
ing  whether  row  sums  or  column  sums  have  been  perturbed  and  a  variable  zerop 
indicating  if  zero  slack  is  obtained.  )) 

(1)  Let  n  =  length  of  a;  m  =  length  of  b. 

(2)  Compute  V^-and  Vp-  the  number  of  different  values  in  a  and  b  resp. 

Tf  v  then  set  pert  =  "rows".  Else  set  pert  =  "columns"  and  perform  the 

rest  of  this  routine  with  b,  V^and  m  instead  of  a,  V  -  and  n  resp. 

(3)  Find  h  and  l  for  which  ah*ak-.x  ,  a^Oj  +  i  ,  and  the  number  of  different 

V,  vs 

values  in  <ax,  .  .  .  and  <a/,,  .  .  .  ,a;>  are  l  ~  j  and  f  ^  ]  resp.  Let 

H=ah- 1  and  L=ai  +  X. 

iiat-L) 

(4)  Compute  q  =  — 7 - J  and  /  =  ^(a,— L)  mod  ( H-L ). 

(5)  Compute  SL[iJ]  (( the  slack  matrix  ))  for  all  l<i<n  ,  l<j<m  I.P. 

(6)  Compute  ml  =  min  { SL[iJ ]  |  1-J-^  }  for  all  h^i^l  I.P. 

(7)  Compute  —  a-)  for  all  h^i<h+q  I.P. 

j=h 

(8)  Compute  m {  =  ml  —  ^  ( aj—L )  for  all  h  +  q  ^  i  <1  I.P. 

j  =  ‘  + 1 

(9)  If  OTi’>0  for  all  h  ^i<l  then  set  T  =  2  (a,  -L)  +  max  {0,ah  +  q  —  I}. 

i  —  k  +  q  +  1 

Else  set  T  =  min  {ml  \  m,'< 0/,  and  set  zerop  to  true. 

(10)  Initialize  S[tj]  =  0  for  all 

(11)  ( a,S )  =  SHIFT -UNITS(<ah,...,at>  ,T  ,H  ,L). 
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(12)  Set  a  =  <a1,...,aA_1>  ||  a  ||  <ai  +  l,  .  .  .  ,an>. 

(13)  Set  SL[iJ ]  =  SL[iJ]  —  £  max/O  ,ak-ak}  for  all  h<i<l  ,  l<;<m  I.P 

k  =h 

(16)  Return  (2,6  ,S  ,SL  ,pert). 
end  PERTURBATION 


procedure  SHIFT-UNITS (a,T ,H ,L) 

((  Shifts  a  total  of  T  units  between  active  rows  with  row  sums  a.  H  is  the  upper 
bound  on  new  rows  sums  and  L  is  the  lower  bound.  Returns  the  new  row  sums  and 
the  matrix,  S,  of  the  numbers  of  units  shifted  between  pairs  of  rows.  )) 

(1)  Denote  the  elements  of  a  by  ah,  .  .  .  ,a; 

(2)  Compute  for  all  1  <i^T  I.P: 

d,  =  max  {j  |  ^(ajt-L)  }  ((  donor  of  unit  i  )) 

*  -j 

rL  —  min  { j  \  i—  ^  (H  —aO  }  ((  receiver  of  unit  i  )) 

(3)  Compute  S[iJ]  =  #/ k  |  dk  =  i  ,  rk=j  }  for  all  h^j<i^l  I.P. 

(4)  Compute  at  =  a,  +  r,  —  d,  for  all  h^i^l  I.P. 

(5)  Return  (<?,S). 
end  SHIFT-UNITS 


procedure  CORRECTION {M ,S , pert) 

((  This  procedure  computes  one  correction  phase.  The  inputs  are  a  realization 
matrix,  M,  a  matrix,  S,  containing  amounts  of  units  to  be  slid  and  a  variable,  pert, 
indicating  if  units  need  to  be  slid  between  rows  or  columns.  The  output  is  the 
matrix,  M,  after  it  has  been  corrected.  )) 

(1)  Let  n  =  length  of  S. 

(2)  Construct  the  donation  graph,  G,  where: 

V(G )  =  {l . n}  E(G)  =  /  { ij }  |  S[iJ]> 0  } 

(3)  For  every  connected  component,  T ,  of  G  do  I.P: 
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(3.1)  Partition  T  into  two  constellations,  Cx  and  C2. 

(3.2)  Perform  SLIDE-UNITS(C ,M ,S ,pert)  for  every  connected  com¬ 
ponent,  C,  of  Ci  I.P. 

(3.3)  Perform  SLIDE -UNITS (C  ,M ,S ,pert)  for  every  connected  com¬ 
ponent,  C,  of  C2  IP. 

(3.4)  Return  M 
end  CORRECTION 


procedure  SLIDE-UNITS(C ,M ,S ,pert) 

((  Units  are  slid  in  the  matrix  M,  between  one  donor  and  many  receivers  or  one 
receiver  and  many  donors.  The  vertices  of  the  star,  C,  are  the  participating 
rows/columns  of  M.  The  matrix,  S,  contains  the  numbers  of  units  to  be  slid  and 
the  variable  pert  indicates  if  units  need  to  be  slid  between  rows  or  columns.  )) 

(1)  Let  c  be  the  unique  non-leaf  of  C  ((  If  C  has  exactly  two  vertices  let  c  be 
any  one  of  them  )).  Let  lv  .  .  .  ,ld  be  the  remaining  vertices  of  C . 

(2)  If  pert  ="rows"  then  let  MclMix,  ■  ■  ■  Mid  be  rows  c,lx,  .  .  .  ,ld  of  Af. 

Else  let  MC,M ,,,  .  .  .  ,Mid  be  columns  c,lx,  .  .  .  ,ld  of  M . 

(3)  If  S[c,lx]>0  ((  i.e.  c  is  a  donor  and  l{  are  receivers  ))  then  complement 

.  .  .  ,Mld  I.P,  and  set  comp  to  true. 

Let  s,  =  max/S[/i,c],S[c,/,]/  ((  the  number  of  units  to  be  slid  from  Mc  to  Mti  )) 
for  1  <i 

(4)  Construct  the  bipartite  graph,  B  =  {X,Y,E): 

X  =  {Xj  |  Mc\j)=  1} 

Y  =  fy,*  |  lssisd  ,  ls*ssi/ 

E  =  |  Mti\j]  =  0} 

(5)  Compute  F,  a  maximal  matching  in  B. 

(6)  For  all  {xj,yl<k}(iF  do  in  parallel:  set  Mc\J]  =  0  and  Mt\j]  =  l. 

(7)  If  comp  then  complement  .  .  ■  ,Mid  I-P. 

(8)  Copy  MC,M/  ,  .  .  .  ,Mid  back  into  their  original  location  in  M  ((  see  step  (2) 
))• 
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end  SLIDE-UNITS 

procedure  BASE-CASE  (a  ,b) 

((  Constructs  a  matrix,  M,  with  row  sums  a  and  column  sums  b,  where  the  number 
of  different  values  of  elements  in  a  and  b  is  at  most  five.  )) 

(1)  Let  ax>  •  •  •  >ak  and  bx>  •  •  ■  >6;  be  the  values  of  the  elements  of  a 
and  b  resp.,  and  let  nx,  .  .  .  ,nk  and  mx,  .  .  .  ,mx  be  their  respective  multi¬ 
plicities. 

(2)  Construct  a  flow  network,  N,  with  vertices  s,  t,  uu  .  .  .  ,  uk,  o1;  .  .  .  ,  o; 

and  the  following  arcs  (for  all  1 S  i  £  k  ,  1  ^  /  ): 

from  s  to  u,  with  capacity 
from  Vj  to  t  with  capacity  mybj 
from  ui  to  Vj  with  capacity  nl  m] 

(3)  Find  a  max  s  —  t  flow  in  N.  For  all  ij  let  F be  the  flow  on  the  arc  (u^Vj). 

(4)  For  all  ij  construct  MtJ  as  shown  in  figure  2.4.  There  are  FtJ  mod  nt 

selected  rows,  starting  at  row  (  A  + 1)  mod  (cyclically)  and 

h-  1 

Fi  ;  mod  m,  selected  columns,  starting  at  column  +  mod  nt. 

,J  J  h  =  l 

(5)  Let  M  be  the  appropriate  concatenation  of  the  Mtj  s. 

(6)  Return  M . 
end  BASE-CASE 

2.6.  Parallel  Complexity 

The  time  and  processor  bounds  of  our  algorithm  depend  on  how  we  chose  to 
implement  the  maximal  matching  routine.  Two  competing  implementations  are 
given  in  [IS]  and  [Luj.  On  a  graph  with  e  edges,  Israeli  and  Shiloach’s  algorithm 
takes  time  0(log3e)  and  uses  0(e)  processors  on  a  CRCW  PRAM.  Luby’s  algorithm 
requires  only  0( log2e)  time  on  an  EREW  PRAM,  but  uses  0(e~)  processors.  It  is 
straightforward,  though  somewhat  tedious,  to  verify  that  all  the  other  operations 
in  one  phase  of  MATRIX-CONSTRUCTION  can  be  performed  with  the  resources 
required  for  maximal  matching  (in  both  the  implementations  listed  above). 
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There  are  0(log|M|)  phases  (as  proven  in  section  2.2).  In  a  correction  phase 
for  rows  there  are  O(n)  parallel  calls  to  maximal  matching  on  bipartite  graphs 
with  0(m2)  edges  each.  When  columns  are  corrected,  there  are  0(m )  calls,  each  of 
size  0(n2).  Thus  the  number  of  processors  required  is 
0(nm(n+m))  =  0{\M\-{n  +  m))  using  [IS],  and  0(nm(n  +  m)3)  =  0(\M | in  +  m)3) 
using  [Lu].  When  n  =0(m)  the  processor  requirements  are  0(| Af | 1  5)  and  0{\M\  ) 

respectively. 


3.  The  Symmetric  Supply-Demand  Problem 

In  this  section  we  will  show  how  the  methodology  developed  in  section  2  gives 
rise  to  a  parallel  algorithm  to  the  symmetric  problem.  Here  the  input  is  a 
sequence  of  integers,  s  fn ,  summing  to  zero.  The  goal  is  to  construct 

a  flow  pattern  in  which  every  vertex  can  send  up  to  one  unit  of  flow  to  any  other 
vertex  such  that  the  flow  out  of  u,  minus  the  flow  into  it  is  (for  all  l<t  ^n).  The 
goal  can  be  viewed  as  constructing  an  nXn  zero-one  matrix,  M  (where  M[iJ]  is 
the  amount  of  flow  sent  from  vertex  i  to  vertex  j)  such  that,  for  all  t,  the  the 
number  of  ones  in  row  i  minus  the  number  of  ones  in  column  i  is  fr  Note  that 
changing  the  values  along  the  main  diagonal  does  not  change  the  instance  M 
describes,  so  they  can  all  be  set  to  zero  at  the  end  of  the  computation. 

Again  we  start  with  a  network-flow  formulation  for  the  problem.  The  flow 
network  has  n  +  2  vertices:  s,  t,  vx,  .  .  .  ,  vn.  If  fi> 0  then  there  is  an  arc  from  s  to 
v j  with  capacity  fit  and  if  ^<0  then  there  is  an  arc  from  to  t  with  capacity  fi. 
Also,  there  is  an  arc  with  capacity  1  from  ut  to  Uj  for  all  1  <ij<n.  Examination  of 
this  network  shows  that  there  are  only  n  potential  min  cuts:  of  all  cuts  containing 
x  vertices  with  s,  the  one  containing  .  .  .  ,vx  is  of  smallest  capacity.  Thus,  for 
this  problem  we  have  a  slack  vector.  An  analysis  similar  to  the  one  in  section  2 
shows  that,  for  all  1  <x  S  n: 

si  Ax)  =  x-(n-x)  -  j £/”, 

'  1  =  1 

It  is  interesting  to  note  that  here,  as  opposed  to  the  matrix  construction  problem, 
the  object  describing  the  slacks  (a  vector  of  length  n )  has  a  different  size  (and 
dimension)  than  the  object  being  constructed  (an  nXn  matrix). 

A  perturbation  phase  is  performed  in  the  same  way  as  in  section  2,  except 
that  there  is  only  one  sequence  being  perturbed  (as  opposed  to  separate  row  and 
column  sequences).  Again  we  have  the  property  (similar  to  proposition  2.3)  that 
shifting  a  unit  from  j  to  i  (i<j)  decreases  the  slacks  at  entries  i,  i  + 1,  .  .  .  j  - 1, 
and  does  not  change  the  other  entries. 
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A  correction  phase  is,  however,  trickier  than  before.  The  reason  is  that  if  a 
unit  is  to  be  returned  from  entry  i  to  entry  j,  it  can  be  done  either  by  sliding  a 
unit  from  row  i  to  row  j  or  by  sliding  a  unit  from  column  j  to  column  i.  The 
equivalent  of  lemma  2.1  holds  here,  but  for  each  unit  only  one  of  the  two  ways  of 
sliding  listed  above  is  guaranteed  to  exist.  Furthermore,  if  we  simultaneously  try 
to  slide  units  in  rows  and  in  columns,  conflicts  may  arise. 

Our  solution  is  to  perform  the  correction  in  two  stages:  first  slide  between 
rows,  then  slide  between  columns.  The  first  stage  is  identical  to  a  row-correction 
phase  of  section  2.  The  only  difference  is  that  the  maximal  matching  computed 
does  not  necessarily  cover  all  the  vertices  of  one  side  of  the  bipartite  graph,  B. 
After  the  first  stage,  we  update  the  donation  matrix  (the  s(ij)s),  according  to  the 
numbers  of  units  slid  in  the  first  stage.  We  then  perform  a  column-correction 
phase  for  the  resulting  problem. 

Lemma  3.1:  Every  maximal  matching  computed  in  the  second  stage  is  maximum. 
Proof:  As  in  section  2.3,  let  R,Du...,Dd  be  the  vertices  of  a  star  in  the  dona¬ 
tion  graph.  Let  Bx  =  (X1,Y1,El)  be  the  bipartite  graph  for  sliding  between  the  rows 
corresponding  to  these  vertices  in  the  first  stage.  Let  B2  =  {X2,Y2,E2)  be  the  bipar¬ 
tite  graph  for  sliding  between  the  columns  corresponding  to  these  vertices  in  the 
second  stage.  Then,  as  in  the  proof  of  lemma  2.5,  for  each  vertex  in  Y2,  the  sum  of 
its  degrees  in  Bx  and  B2  is  at  least  | Yx\  +  l.  It  follows  that  the  degree  of  every 
such  vertex  in  B 2  is  at  least  |  F2|  +  1.  Q 

Corollary  3.1:  Every  unit  that  is  perturbed  gets  slid  in  one  of  the  two  stages. 

The  base  case  is  solved  along  the  same  lines  described  in  section  2.4,  but  a  few 
more  details  need  to  be  handled.  The  base  case  is  when  there  are  at  most  five 
different  values,  fx>  •  •  •  >f5,  with  respective  multiplicities  nv  .  .  .  ,n5.  Again 
we  start  by  finding  a  max  flow  in  a  constant  size  network  (having  7  vertices  - 
s,  t,  v  1,  .  •  •  ,  V5)  to  determine  the  number  of  units,  FlJt  in  Mtj.  Now,  as  opposed  to 

the  previous  case,  %FtJ  needn’t  be  an  integer  multiple  of  nt.  Therefore,  after 
j  =  i 

distributing  unit  evenly  between  all  rows  with  the  same  f  value  (as  described  in 
section  2.4),  some  of  these  rows  will  have  p  units  and  some  will  have  p  —  1  units 
(for  some  appropriate  p).  Similarly,  not  all  the  columns  with  the  same  f  value  will 
necessarily  have  the  same  number  of  units.  We  overcome  this  obstacle  by  observ¬ 
ing  that  if  i  and  j  have  the  same  f  value,  and  if  row  sum  i  is  greater  by  one  than 
row  sum  j  then  column  sum  i  should  be  greater  by  one  than  column  sum  j.  There¬ 
fore,  the  problem  is  solved  by  (using  terminology  of  section  2.4)  selecting  rows  and 
columns  in  the  same  order. 

Finally  we  note  that  the  algorithm  for  the  symmetric  problem  uses  the  same 
resources  (time  and  number  of  processors)  as  the  matrix  construction  algorithm 
(see  section  2.6). 
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4.  Digraph  Construction 

In  this  section  we  describe  our  solution  for  the  problem  of  constructing  a  sim¬ 
ple  digraph  with  specified  in-degree  and  out-degree  sequences.  By  "simple"  we 
mean  no  self  loops  and  no  parallel  arcs.  Notice  that  if  self  loops  are  allowed,  this 
problem  is  exactly  the  matrix  construction  problem  described  in  section  2.  The 
digraph  construction  problem  can  be  stated  as  follows:  given  two  equal-length 
sequences,  (o1;  .  .  .  ,on)  and  (iu  .  .  .  ,in)  ,  (that  are  not  necessarily  sorted!),  con¬ 
struct  an  nXn  zero-one  matrix,  M,  that  has  ok  l’s  in  row  k  and  i*  1  s  in  column  k 
(for  all  so  that  all  the  elements  on  the  main  diagonal  of  M  are  zero. 

Our  solution  is  based  on  the  algorithm  described  in  section  2.  We  start, 
again,  by  looking  at  the  network  flow  formulation  for  this  problem.  The  network  is 
almost  identical  to  the  one  in  figure  1,  except  that  each  vertex  on  the  left  is  miss¬ 
ing  one  outgoing  arc,  and  each  vertex  on  the  right  is  missing  one  incoming  arc.  It 
is  convenient  to  view  the  missing  arcs  as  existing  arcs  with  capacity  zero.  We  will 
call  these  blocked  arcs  and  the  corresponding  entries  in  the  realization  matrix 
blocked  entries.  Our  first  goal  is  to  show  that  in  this  case  too  there  are  only  n 
potential  minimum  cuts.  Let  fljS  •  •  •  >an  and  •  •  •  *bn  be  the  sorted 

sequences  of  out-degrees  and  in-degrees  respectively  (i.e.  a  is  obtained  by  sorting 
o  and  b  by  sorting  73,  and  let  N  be  the  network  corresponding  to  a  and  b  (similar 
to  the  one  shown  in  figure  1).  The  capacity  of  the  cut  Czy  (as  shown  in  figure  2)  is, 
in  this  case: 

capacity  {Czy)  =  ^  a;  +  bj  +  x  y  —  B(x,y) 

1  =  1  +  1  j=y  +  i 

where  B(x,y)  is  the  number  of  blocked  arcs  crossing  the  cut.  Since  there  is  at  most 
one  blocked  entry  in  every  row  and  every  column,  a  simple  argument  shows  that  if 

az>az  +  i  and  by>by  +  l  then  this  cut  has  the  smallest  capacity  among  all  cuts  for 

which  the  s  side  contains  x  vertices  on  the  left  and  n-y  vertices  on  the  right. 
However,  if,  say,  az=az  +  1  then  a  the  cut  obtained  by  switching  vertices  uz  and 
uI  +  1  might  have  smaller  capacity,  since  the  number  of  blocked  arcs  crossing  it 
could  be  greater  by  one.  Therefore,  if  we  want  the  cuts  Czy  to  be  the  only  poten¬ 
tial  minimum  cuts,  we  need  to  be  careful  about  the  ordering  of  "row"  vertices 
corresponding  to  rows  with  equal  row  sums,  and  similarly  for  columns.  The  condi¬ 
tions  we  need  to  enforce  on  the  order  are,  simply:  if  az  =  az  +  1,  then  the  blocked 
entry  in  row  x  should  be  in  a  lower-indexed  column  than  the  blocked  entry  in  row 
x  +  1.  The  symmetrical  conditions  should  hold  for  columns. 

These  conditions  can  be  obtained  by  two  rounds  of  sorting:  first  sort  rows 
according  to  row  sums.  Sort  rows  with  equal  sums  according  to  the  corresponding 
column  sums  (i.e.  the  correspondence  given  by  the  o  and  t  sequences),  breaking 
ties  arbitrarily.  Now,  sort  the  columns  according  to  column  sums.  Columns  with 
equal  sums  are  sorted  according  to  the  order  of  the  corresponding  rows  that  was 
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obtained  in  the  first  round.  No  ties  can  arise,  since  there  is,  at  this  point,  a  total 
ordering  of  the  rows. 

After  this  preprocessing  is  done,  we  are  ready  to  proceed  along  the  same  lines 
as  the  algorithm  described  in  section  2,  with  a  few  modifications.  The  slack  func¬ 
tion  is  now: 

slffaj)  =  £  a,  -  +  x-y  -  B(x,y) 

i=z- 1-1  j  =  1 

By  the  discussion  above,  it  is  again  true  that  an  instance  is  realizable  if  and  only  if 
its  slack  matrix  is  non-negative.  If  sUg(x,y)  =  0  then  M[tj]  =  l  for  all 
l<i<x,lSy^y  except  for  blocked  entries,  and  M[ij]  =  0  for  all  i  +  isi<n 

,  y + 

The  perturbation  phases  work  identically  here,  since  they  only  deal  with  the 
row  and  column  sums,  and  not  with  the  internal  structure  of  the  realization 
matrix. 

In  the  correction  phases  there  is  a  small  modification  -  units  should  not  be  slid 
into  blocked  entries.  This  is  fixed  by  modifying  the  bipartite  graph,  B,  in  the  obvi¬ 
ous  way.  Also,  we  need  to  re-examine  the  proof  of  lemma  2.5.  It  works  out  exactly 
right  in  this  case,  since  it  turns  out  that: 

for  all  i,k  degree{ylk)  s  \Y\ 

which  is  precisely  sufficient  (see  the  original  proof). 

The  only  tricky  modification  turns  out  to  be  for  the  base  case.  Again,  there 
are  at  most  five  different  row  sum  values  and  five  different  column  sum  values. 
The  difficulty  is  that  there  are  blocked  entries  scattered  throughout.  This  spoils  the 
simple  cyclic  realization  that  existed.  We  overcome  this  by  partitioning  the  matrix 
into  finer  sub-matrices  than  in  the  previous  case.  Each  of  the  MtJ’ s  is  partitioned 
further  so  that  each  sub-matrix  either  contains  no  blocked  entries,  or  contains  a 
blocked  entry  in  every  row  and  column. 

Again  we  construct  a  realization  in  two  steps.  The  first  step  is  to  determine 
the  total  number  of  units  in  each  sub-matrix.  This  is  done,  here  too,  by  solving  a 
max  flow  problem  (where  the  capacity  of  a  sub-matrix  is  the  number  of  non- 
b locked  entries  in  it).  Again,  the  network  here  is  of  constant  size,  so  a  max  flow 
can  be  computed  in  constant  time.  In  the  second  step,  the  units  are  distributed 
within  the  sub-matrices.  The  key  here  is  to  deal  first  with  the  sub-matrices  con¬ 
taining  blocked  entries.  It  is  not  always  possible  to  select  arbitrary  sets  of  rows 
and  columns,  but  it  is  possible  to  distribute  the  units  so  that  the  discrepancy 
between  any  two  rows  or  any  two  columns  will  be  at  most  one  unit.  This  can  be 
done  as  follows:  say  the  blocked  entries  are  along  the  main  diagonal  (this  will 
actually  always  be  the  case  because  of  the  preprocessing),  and  let  k  be  the  number 
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of  rows  (and  columns)  of  the  sub-matrix.  Let  dr  (the  r’th  diagonal)  be  the  set  of 
entries,  (ij),  for  which  j—i  -  r  ( mod  k).  If  F  units  are  to  be  distributed,  fill 

d.  dr,  and  place  the  remaining  units  in  d  r  •  An  example  is  shown  in 
l’  '  Ijl  l71  +  1 

figure  4.1. 


Figure  4.1:  A  5X5  sub-matrix  with  blocked  entries  containing  11  units. 

Now,  after  the  "problematic"  sub-matrices  have  been  dealt  with,  we  can  construct 
the  sub-matrices  with  no  blocked  entries  in  the  same  fashion  as  described  in  sec¬ 
tion  2.4.  The  same  arguments  for  proving  validity  of  the  scheme  go  through, 
because  there  is  at  most  one  blocked  entry  in  every  row  or  column. 


5.  Bounds  on  Supplies  and  Demands 

Our  parallel  algorithm  for  the  matrix  construction  problem  can  be  extended  to 
the  case  in  which  the  sequences  a  and  b  represent  upper  bounds  on  row  sums  and 
lower  bounds  on  column  sums  respectively.  This  is  a  natural  extension  of  the 
matrix  construction  problem  when  rows  represent  supplies  and  columns  represent 

demands. 

Let  U  =  and  L-$  br  Let  M  be  a  realization  matrix  for  the 

1=1  i=1 

instance  (a, V),  and  let  S  be  the  number  of  l’s  in  M.  Then,  clearly,  L^S^U.  Say 
we  fix  S.  Then  the  problem  boils  down  to  the  following:  modify  the  sequences  a 
and  b  to  obtain  a  and  respectively  so  that: 

(1)  al^al  and  bj^fij  for  all  l^i^n  , 

(2)  =  S. 

i -  i  j  =  i 

(3)  (a,  is  realizable. 

It  is,  of  course,  not  always  possible  to  satisfy  all  three  conditions  simultane¬ 
ously.  Thus  our  goal  is  find  such  a  pair  of  sequences  if  it  exists. 
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The  key  for  obtaining  the  sequences  3  and  is  to  consider  the  slack  matrix, 
as  defined  in  section  2.1.  Recall  that  the  condition  for  realizability  is  that  all  the 
slacks  are  non-negative,  and  that: 

+  x'y 

1=1+1  j= 1 

where  ■  •  San  and  S  ■  s/lm. 


Lemma 

sequence 


5.1:  Let  axs  •  •  •  >an  and  6X>  •  •  •  s6m.  Let  3(fc)  (0{l))  be  the 
obtained  from  a  (6)  by  subtracting  1  from  ak  (adding  1  to  bt).  Then 


(3(1), ^(m))  is  realizable  if  (3(A),/?(/))  is  (for  any  1  <k  < n  ,  1  ^ m). 


Proof: 


5  (a( l)i-a(k)i)  +  j£(fl(l)j-fl(m)j) 

i= Z+l  J=1 


It  is  easy  to  see  that  for  all  values  of  x,y,k  and  l  this  difference  is  non-negative, 
which  proves  the  lemma.  □ 


Theorem  5.1:  Let  3(S)  be  obtained  from  a  by  repeatedly^subtracting  1  from  the 
largest  element  U-S  times  and  let  £(S)  be  obtained  from  b  by  repeatedly  adding  1 
to  the  smallest  element  S-L  times.  Then  (3(S),^(S))  is  realizable  if  there  is  any 
realizable  pair  of  sequences  (?,*)  where  y^a,  ,  Sj^bj  (for  all  l<isn,l<jSm) 

and  5  7,  =  Sfy  =  S- 
.=i  j= i 

Proof:  By  induction  on  U  —  S  using  lemma  5.1  [] 

(3(S),/?(S))  can  be  obtained  from  (a,£)  efficiently  in  parallel  by  a  simple 
partial-sums  computation.  The  algorithm  is. 


(1)  For  all  S  ,  l^S^U  ,  do  I.P: 

(1.1)  Compute  3(S)  and  /?(S). 

(1.2)  Test  if  (3(S),^(S))  is  realizable  ((  using  the  method  described  in  [FF]  )). 

(2)  Select  an  S  for  which  (3(S),/?(S))  Is  realizable. 

(3)  Compute  M  =  MATRIX— .CONSTRUCTION  {!%(§),. 

Steps  (1.1)  and  (1.2)  are  simple  partial-sum  computations,  and  can  be  imple¬ 
mented  using  Oin  +  m)  processors.  Since  steps  (1)  and  (2)  can  be  implemented 
within  the  time  and  processor  bounds  used  for  step  <3),  the  algorithm  has  the  same 
parallel  complexity  as  the  matrix  construction  algorithm.  Note  that  we  may  per¬ 
form  step  (2)  with  some  criterion  in  mind  (e.g.  "construct  a  matrix  with  the  smal¬ 
lest  possible  number  of  l’s  subject  to  ..."). 
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The  extension  of  the  symmetric  supply-demand  problem  turns  out  to  be  even 
simpler.  Here  the  natural  extension  would  be  that  all  the  f  values  represent  upper 
bounds,  since  making  a  number  "less  positive"  corresponds  to  less  supply,  and 
making  a  number  "more  negative"  corresponds  to  more  demand.  So  in  an  instance 
of  this  problems,  the  positive  number  would  sum  up  to  +H  and  the  negative 
number  would  sum  up  to  —  L,  for  some  H>L. 

Here,  as  opposed  to  the  matrix  construction  problem,  it  is  clear  which  value  of 
S  works  best  (where  S  is  the  sum  of  the  positive  entries,  and  minus  the  sum  of  the 
negative  entries).  By  looking  at  the  expressions  for  the  slack  vector,  one  can  see 
that  decreasing  S  cannot  ruin  feasibility.  Therefore  S  should  be  selected  to  be  as 
small  as  possible,  i.e.  S  =  L. 

To  summerize,  only  the  positive  f  entries  should  be  modified.  Again,  as  in  the 
matrix  construction  problem,  the  best  way  to  modify  these  numbers  is  to  repeat¬ 
edly  subtract  one  unit  from  the  largest  entry  until  H-L  units  have  been  sub¬ 
tracted. 
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